refactor(integration_tests): unify bridge models and clear dbt deprecations#1325
Open
rabee05 wants to merge 9 commits intotuva-health:mainfrom
Open
refactor(integration_tests): unify bridge models and clear dbt deprecations#1325rabee05 wants to merge 9 commits intotuva-health:mainfrom
rabee05 wants to merge 9 commits intotuva-health:mainfrom
Conversation
…thetic mode * Rename patient_seed.csv to raw_data__patient.csv so every clinical input seed follows the raw_data__<table> convention that tuva_source() resolves to. * Add empty header-only CSVs for condition, encounter, location, medication, practitioner, procedure — gives each clinical table a seed relation in synthetic mode even when no synthetic data exists. * Replace repeated per-column cross-adapter jinja with YAML anchors (*string, *datetime, *float) in seeds/_seeds.yml, dropping ~640 lines of duplication. * Gate every raw_data__* seed on use_synthetic_data so non-synthetic runs no longer materialize unused seed tables.
…sion-column checks
BigQuery rejects "cast(... as varchar)"; use the cross-adapter type
macro so check_extension_columns_in_core_{eligibility,medical_claim,
member_months,pharmacy_claim} tests compile on every warehouse.
Collapse the repeated per-column cross-adapter jinja (bigquery/databricks string, athena/databricks real, fabric datetime2, etc.) into three shared anchors — *string, *datetime, *float — and reference them from every column_types entry. Also gate each synthetic_data__* seed on use_synthetic_data so non-synthetic runs stop materializing seeds nothing refs. Drops ~260 lines of repetition from seeds/synthetic_data/synthetic_data_seeds.yml without changing runtime type resolution.
… at compile time
Returns ref('raw_data__<table>') when use_synthetic_data is true, or
source('source_input', <table>) otherwise. Because ref() is evaluated
at parse time, dbt wires the seed as an upstream dependency of every
bridge model that calls tuva_source(), so seeds run before their
dependent models without an on-run-start hook or explicit ordering.
Returns the Relation object (not a rendered string), so callers can
use either "from {{ tuva_source('X') }}" or bind it with {% set r =
tuva_source('X') %} for adapter.get_columns_in_relation(r).
… into single SELECT via tuva_source()
Every bridge model had two duplicated SELECTs gated by
use_synthetic_data. Replaced with one SELECT reading from
tuva_source('<table>') — the macro swaps ref() vs source() at
compile time so the toggle is invisible to the model.
With _sources.yml restored, setting input_database + input_schema
now points the same models at the user's own input layer when
use_synthetic_data is false. No model edits needed.
…eprecation dbt now flags custom top-level config keys; nest the seed batch_size hint under +meta so the deprecation warning goes away.
dbt 1.10+ requires generic-test arguments to be nested under arguments (MissingArgumentsPropertyInGenericTestDeprecation). Updated the hcc_recapture staging/intermediate/final yml tests.
✅ Deploy Preview for thetuvaproject canceled.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
use_synthetic_dataduplicated column lists that drifted apart.raw_data__*seeds to bridge models, sodbt buildsometimes ran models before seeds and failed withDataset raw_data not found.tuva_columns/tuva_extensions/tuva_metadata) and ~500 copies of the same cross-adapter type jinja across seed yml files.dbt parsesurfaced three deprecations and one BigQuery error (cast(… as varchar)).Fix
tuva_source()macro — returnsref('raw_data__<table>')in synthetic mode,source('source_input', <table>)otherwise. Compile-timeref()registers the seed → model DAG edge.tuva_columns/tuva_extensions/tuva_metadatalayout.models/_sources.ymlwith optionalinput_database/input_schema(fall back totarget.database/target.schema).*string,*datetime,*float) in both seed yml files.raw_data__*andsynthetic_data__*seeds onuse_synthetic_data.patient_seed.csv→raw_data__patient.csv; added header-only CSVs for 6 missing clinical tables.+batch_size→+meta.batch_size;combination_of_columnsnested underarguments:;varchar→{{ dbt.type_string() }}in extension tests.How to test
dbt build --full-refreshwithuse_synthetic_data: true— seeds run before bridge models, extension-column tests pass on BigQuery.dbt build --full-refreshwithuse_synthetic_data: falseandinput_database/input_schemapointed at a real input layer — bridge models read fromsource_input, no synthetic seeds materialize.Breaking changes
None. Tuva package contract unchanged.
Author: SnowQuery — Healthcare Data Engineering & Architecture Consulting